Domain shift across crowd data severely hinders crowd counting models to generalize to unseen scenarios. Although domain adaptive crowd counting approaches close this gap to a certain extent, they are still dependent on the target domain data to adapt (e.g. finetune) their models to the specific domain. In this paper, we aim to train a model based on a single source domain which can generalize well on any unseen domain. This falls into the realm of domain generalization that remains unexplored in crowd counting. We first introduce a dynamic sub-domain division scheme which divides the source domain into multiple sub-domains such that we can initiate a meta-learning framework for domain generalization. The sub-domain division is dynamically refined during the meta-learning. Next, in order to disentangle domain-invariant information from domain-specific information in image features, we design the domain-invariant and -specific crowd memory modules to re-encode image features. Two types of losses, i.e. feature reconstruction and orthogonal losses, are devised to enable this disentanglement. Extensive experiments on several standard crowd counting benchmarks i.e. SHA, SHB, QNRF, and NWPU, show the strong generalizability of our method.
translated by 谷歌翻译
在计算机视觉中,在评估深度学习模型中的潜在人口偏见方面具有重要的研究兴趣。这种偏见的主要原因之一是训练数据中的失衡。在医学成像中,偏见的潜在影响可以说要大得多,因此兴趣较小。在医学成像管道中,对感兴趣的结构的分割在估计随后用于告知患者管理的临床生物标志物方面起着重要作用。卷积神经网络(CNN)开始用于自动化此过程。我们介绍了训练集失衡对种族和性别偏见在基于CNN的细分中的影响的首次系统研究。我们专注于从短轴Cine Cine心脏磁共振图像中对心脏结构进行分割,并训练具有不同种族/性别不平衡水平的CNN分割模型。我们发现性实验没有明显的偏见,但是在两个单独的种族实验中有明显的偏见,强调需要考虑健康数据集中不同人口组的足够代表。
translated by 谷歌翻译
透视扭曲和人群的变化使人群在计算机视觉中计算一项具有挑战性的任务。为了解决这个问题,许多以前的作品都使用了深神经网络(DNNS)中的多尺度体系结构。多尺度分支可以直接合并(例如,通过串联)合并,也可以通过DNNS中代理(例如注意力)的指导合并。尽管存在盛行,但这些组合方法的复杂性不足以应对多尺度密度图上的每个像素性能差异。在这项工作中,我们通过引入密度专家的​​层次混合物来重新设计多尺度神经网络,该密度专家的​​分层混合物层次合并了多尺度密度图以进行人群计数。在层次结构中,提出了一项专家竞争和协作计划,以鼓励各种规模的贡献;引入了像素的软门网,以提供像素的软重量,以用于不同层次结构的比例组合。使用人群密度图和本地计数图对网络进行了优化,该图是通过前者对本地集成获得的。优化两者的潜在冲突可能是有问题的。我们基于图像中硬预测的本地区域之间的相对计数差异引入了新的相对局部计数损失,事实证明,这是与密度图上常规的绝对误差损失相辅相成的。实验表明,我们的方法在五个公共数据集上实现了最先进的性能,即上海,ucf_cc_50,jhu-crowd ++,nwpu-crowd和trancos。
translated by 谷歌翻译
时空视频超分辨率(STVSR)的目标是提高帧速率(也称为时间分辨率)和给定视频的空间分辨率。最近的方法通过端到端的深神经网络解决了STVSR。一个流行的解决方案是首先提高视频的帧速率;然后在不同的框架功能之间执行特征改进;最后增加了这些功能的空间分辨率。在此过程中,仔细利用了不同帧的特征之间的时间相关性。然而,尚未强调不同(空间)分辨率的特征之间的空间相关性。在本文中,我们提出了一个时空特征交互网络,以通过在不同框架和空间分辨率的特征之间利用空间和时间相关来增强STVSR。具体而言,引入了空间 - 周期框架插值模块,以同时和互动性地插值低分辨率和高分辨率的中间框架特征。后来分别部署了空间 - 周期性的本地和全局细化模块,以利用不同特征之间的空间 - 周期相关性进行细化。最后,采用了新的运动一致性损失来增强重建帧之间的运动连续性。我们对三个标准基准测试,即VID4,Vimeo-90K和Adobe240进行实验,结果表明,我们的方法可以通过相当大的余量提高了最先进的方法。我们的代码将在https://github.com/yuezijie/stinet-pace time-video-super-resolution上找到。
translated by 谷歌翻译
大坝水库在实现可持续发展目标和全球气候目标方面发挥着重要作用。但是,特别是对于小型水坝水库,其地理位置缺乏一致的数据。为了解决此数据差距,一种有前途的方法是根据全球可用的遥感图像进行自动水坝水库提取。它可以被认为是水体提取的精细颗粒任务,涉及在图像中提取水区,然后将水坝储层与天然水体分开。我们提出了一种基于新型的深神经网络(DNN)管道,该管道将大坝水库提取到水体分割和大坝储层识别中。首先将水体与分割模型中的背景土地分开,然后将每个水体预测为大坝储层或分类模型中的天然水体。对于以前的一步,将跨图像的点级度量学习注入分段模型,以解决水域和土地区域之间的轮廓模棱两可。对于后一个步骤,将带有簇的三重态的先前引导的度量学习注入到分类模型中,以根据储层簇在细粒度中优化图像嵌入空间。为了促进未来的研究,我们建立了一个带有地球图像数据的基准数据集,并从西非和印度的河流盆地标记为人类标记的水库。在水体分割任务,水坝水库识别任务和关节坝储层提取任务中,对这个基准进行了广泛的实验。将我们的方法与艺术方法的方法进行比较时,已经在各自的任务中观察到了卓越的性能。
translated by 谷歌翻译
零击学习(ZSL)旨在识别培训集中没有样本的类。一种代表性的解决方案是直接学习将视觉特征与相应的类语义相关联的嵌入函数,以识别新类。许多方法扩展了这种解决方案,最近的方法特别热衷于从图像中提取丰富的特征,例如属性功能。这些属性特征通常在每个单独的图像中提取;但是,不强调跨图像的特征的共同特征。在本文中,我们提出了一个新的框架来通过明确学习原型超出图像来提高ZSL,并用图像中的属性级特征对其进行对比优化它们。除了新颖的体系结构外,还针对属性表示强调了两个元素:新的原型生成模块旨在从属性语义生成属性原型;引入了基于硬示例的对比优化方案,以增强嵌入空间中的属性级特征。我们探索了两个基于CNN的替代骨干,基于CNN的骨干,以在三个标准基准测试(Cub,Sun,Awa2)上构建我们的框架并进行实验。这些基准测试的结果表明,我们的方法通过相当大的利润来改善艺术的状态。我们的代码将在https://github.com/dyabel/coar-zsl.git上找到
translated by 谷歌翻译
在视觉识别任务中,很少的学习需要在很少的支持示例中学习对象类别的能力。鉴于深度学习的发展,它的重新流行主要是图像分类。这项工作着重于几片语义细分,这仍然是一个未开发的领域。最近的一些进步通常仅限于单级少量分段。在本文中,我们首先介绍了一个新颖的多通道(类)编码和解码体系结构,该体系结构有效地将多尺度查询信息和多类支持信息融合到一个查询支持嵌入中。多级分割直接在此嵌入后解码。为了获得更好的特征融合,在体系结构中提出了多层注意机制,其中包括对支持功能调制的关注和多尺度组合的注意力。最后,为了增强嵌入式空间学习,引入了一个额外的像素度量学习模块,并在输入图像的像素级嵌入式上提出了三重损失。对标准基准Pascal-5i和Coco-20i进行的广泛实验显示了我们方法对最新技术的明显好处
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译